ANALHITZA: a tool to extract linguistic information from large corpora in Humanities research

نویسندگان

  • Arantxa Otegi
  • Oier Imaz
  • Arantza Díaz de Ilarraza
  • Mikel Iruskieta
  • Larraitz Uria
چکیده

The reduced size of corpora in some areas of research is due to the lack of tools to process massively and easily the language under study. In this article, we present ANALHITZA, a tool which is being developed within the Clarink project, whose aim is the creation of linguistic technologies that are useful for research on Social Sciences and Humanities. ANALHITZA has been designed to extract linguistic information online from large corpora in an easy way. Besides, it is a multilingual tool which can process texts written in three languages: Basque, Spanish and English. Moreover, we present three real examples of study where ANALHITZA has been used. The tool can be redesigned or changed, according to the needs of the scientific community in the field of Humanities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Annotating Corpora from Various Sources in the Humanities Domain

  Voula Giouli  Annotating corpora from various sources in the humanities domain: shortcomings and issues  In this paper, we present work aimed at the linguistic annotation of Greek corpora that belong to the humanities domain, the focus being on the methodological principles as well as the implementation framework adopted. This framework builds on an existin...

متن کامل

Matrix : a statistical method and software tool for linguistic analysis through corpus comparison

Matrix: A statistical method and software tool for linguistic analysis through corpus comparison A thesis submitted to Lancaster University for the degree of Ph.D. in Computer Science Paul Edward Rayson, B.Sc. September 2002 This thesis reports the development of a new kind of method and tool (Matrix) for advancing the statistical analysis of electronic corpora of linguistic data. First, we des...

متن کامل

Draft WebCorp: providing a renewable data source for corpus linguists

The many electronic text corpora available nowadays present ever fewer obstacles to a wide range of corpus linguistic study. However, corpora are expensive resources to create and to update, and there remain problems for linguists if they seek access to very large, very recent, or changing language. The World Wide Web, whilst intended as an information source, is an obvious resource for the ret...

متن کامل

Towards the Spatial Analysis of Vague and Imaginary Places: Evolving the Spatial Humanities through Medieval Romance

The establishment of the field of Spatial Humanities testifies to the success in the use of technologies such as Geographic Information Systems (GIS) for the analysis of texts in Humanities. Although the increasing volume of projects can be regarded as a sign of advance, an important challenge has remained unsolved in this field and it has been barely addressed. The majority of research dealing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 58  شماره 

صفحات  -

تاریخ انتشار 2017